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Abstract: - CORDIC algorithm is very simple and iterative process for performing various mathematical 
computations. Most of the literature lacks in calculation of resources utilized by a particular CORDIC 
architecture. In this paper, serial, parallel and pipelined CORDIC architecture has been implemented for 
computing both sin & cos functions 

Keywords: - About five key words in alphabetical order, separated by comma 



I. Introduction 

CORDIC algorithm is an iterative algorithm, which can be used for the computation of trigonometric 
functions, multiplication and division [1]. Last half century has witnessed a lot of progress in design and 
development of architectures of the algorithm for high-performance and low-cost hardware solutions. CORDIC 
algorithm got its popularity, when [2] showed that, by varying a few simple parameters, it could be used as a 
single algorithm for unified implementation of a wide range of elementary transcendental functions involving 
logarithms, exponentials, and square. During the same time, [3] showed that CORDIC technique is a better 
choice for scientific calculator applications. 

The popularity of CORDIC was very much enhanced thereafter primarily due to its potential for 
efficient and low-cost implementation. With the advent of low cost, low power FPGAs, this algorithm has 
shown its potential for efficient and low-cost implementation. CORDIC algorithm can be widely used in as 
wireless communications, Software Defined Radio and medical imaging applications, which are heavily 
dependent on signal processing. 

Although CORDIC may not be the fastest technique to perform these operations, yet it is attractive due 
to the simplicity and efficient hardware implementation. 

The development of CORDIC algorithm and architecture has taken place for achieving high throughput 
rate and reduction of hardware-complexity as well as the latency of implementation. Latency of implementation 
is an inherent drawback of the conventional CORDIC algorithm. Angle recoding schemes and higher radix 
CORDIC have been developed for reduced latency realization. Parallel and pipelined CORDIC have been 
suggested for high-throughput computation. CORDIC computation is inherently sequential due to two main 
bottlenecks firstly the micro -rotation for any iteration is performed on the intermediate vector computed by the 
previous iteration and secondly the (i+l)th iteration could be started only after the completion of the ith 
iteration, since the value of which is required to start the (i+l)th iteration could be known only after the 
completion of the ith iteration. To alleviate the second bottleneck some attempts have been made for evaluation 
of values corresponding to small micro -rotation angles [4]. However, the CORDIC iterations could not still be 
performed in parallel due to the first bottleneck. A partial parallelization has been realized in [4] by combining a 
pair of conventional CORDIC iterations into a single merged iteration which provides better area-delay 
efficiency. But the accuracy is slightly affected by such merging and cannot be extended to a higher number of 
conventional CORDIC iterations since the induced error becomes unacceptable [5]. Parallel realization of 
CORDIC iterations to handle the first bottleneck by direct unfolding of micro -rotation is possible, but that 
would result in increase in computational complexity and the advantage of simplicity of CORDIC algorithm 
gets degraded [6]. Although no popular architectures are known to us for fully parallel implementation of 
CORDIC, different forms of pipelined implementation of CORDIC have however been proposed for improving 
the computational throughput [7]. To handle latency bottlenecks, various architectures have been developed and 
reported in this review. Most of the well-known architectures could be grouped under bit parallel iterative 
CORDIC, bit parallel unrolled CORDIC , bit serial iterative CORDIC architecture. 

II. Cordic Algorithm 

Keeping the requirements and constraints of different application environments in view, the 
development of CORDIC algorithm and architecture has taken place for achieving high throughput rate and 
reduction of hardware-complexity as well as the latency of implementation. Some of the typical approaches for 
reduced-complexity implementation are focused on minimization of the complexity of scaling operation and the 
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complexity of barrel-shifter in the CORDIC engine. Latency of implementation is an inherent drawback of the 
conventional CORDIC algorithm. Parallel and pipelined CORDIC have been suggested for high-throughput 
computation and efficient CORDIC algorithm. 




Figure 1: Vector Rotation 

CORDIC algorithm has two types of computing modes Vector rotation (Rotating mode) and vector 
translation (Vectoring mode). The CORDIC algorithm was initially designed to perform a vector rotation, where 

the vector V with components (x, y) is rotated through the angle 6 yielding a new vector V with component 
(x , y ) shown in Figure 1 . 



V =[R][V] 

cos 0 -sin 0 
sin 0 cos 0 



R = 



(l) 

(2) 



where R is the rotation matrix: 



V = 



cos 0 - sin 0 
sin 6 cos 6 



(3) 



(4) 
(5) 



individual equations for X and y can be rewritten as: 

x = xcos(6>)-_y.sin(6>) 
y = y.cos(0) + x.sin(0) 

and rearranged so that 

x = cos(0)[x-y.tan(0)] (6) 
y = cos(6>)[y + xtan(6>)] (7) 

The multiplication by the tangent term can be avoided if the rotation angles and therefore tan(#) are 

restricted so that tan(#) = 2 ' . In digital hardware this denotes a simple shift operation. Furthermore, if those 

rotations are performed iteratively and in both directions every value of tan(#) is representable. With 

0 = arctan(2 ' ) the cosine term could also be simplified and since COS(#) = COS(— 0) it is a constant for a 
fixed number of iterations. This iterative rotation can now be expressed as: 

x^tyx-y.^.T] ( 8) 
3 , i+ i = ^[} , ,+^4- r '] (9) 

where k i = COS(arctan(2 )) and d i = ±1 . The product of the k t 's represents the so-called K factor . 
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n-l 



k=y[k, 



(10) 



i=0 



This K factor can be calculated in advance and applied elsewhere in the system. Equations (8) and (9) 
can now be simplified to the basic CORDIC equations: 

^l = U-+*A r ] (12) 

The direction of each rotation is defined by d i and the sequence of all d i 's determines the final vector. 
Each vector V can be described by either the vector length and angle or by its coordinates x and y . Following 
this incident, the CORDIC algorithm knows two ways of determining the direction of rotation: the rotation 
mode and the vectoring mode. Both methods initialize the angle accumulator with the desired angle Z 0 . The 

rotation mode, determines the right sequence as the angle accumulator approaches 0 while the vectoring mode 
minimizes the y component of the input vector. 
The angle accumulator is defined by: 

z i+1 = z i -rf i .arctan(2- i ) (o) 

where the sum of an infinite number of iterative rotation angles equals the input angle 9: 



0 = ^<i..arctan(2 '') 



(14) 



i=Q 



Those values of arctan(2 ' ) can be stored in a small lookup table or hardwired depending on the way 
of implementation. Since the decision is which direction to rotate instead of whether to rotate or not, d i is 
sensitive to the sign of Z t ■ Therefore d i can be described as: 

-1, if z,<0 



d,=- 



+1, if Zi >0 



(15) 



With equation (15) the CORDIC algorithm in rotation mode is described completely. Note, that the 
CORDIC method as described performs rotations only within — n I 2 and nil. This limitation comes from the 

use of 2° for the tangent in the first iteration. However, since a sine wave is symmetric from quadrant to 
quadrant, every sine value from 0 to In can be represented by reflecting and/or inverting the first quadrant 
appropriately. 

In vector translation, rotates the vector V with component (X, Y) around the circle until the Y 
component equals zero as illustrated in Figure 2. The outputs from vector translation are the magnitude X' and 

phase z , of the input vector V with component (X,Y). 



(X,y/ 
Input Vector 



(Mag,0) 



Figure2: Vector Translation 

After vector translation, output equations are: 




Output Phase 



x Zi.(Mag,0) 
■\ Output Mag 



X 



X = k^(x 2 + Y 2 ) 



(16) 
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Y =0 



(17) 



f v\ 



z =<3tan 



X 



(18) 



To achieve simplicity of hardware realization of the rotation, the key ideas used in CORDIC arithmetic 
are to decompose the rotations into a sequence of elementary rotations through predefined angles that could be 
implemented with minimum hardware cost and to avoid scaling, that might involve arithmetic operation, such as 
square-root and division. The second idea is based on the fact the scale-factor contains only the magnitude 
information but no information about the angle of rotation. 

In 1971, John S. Walther found how CORDIC iterations could be modified to compute hyperbolic 
functions and reformulated the CORDIC algorithm into a generalized and unified form which is suitable to 
perform rotations in circular, hyperbolic and linear coordinate systems. The unified formulation includes a new 
variable m , which is assigned different values for different coordinate systems. The generalized CORDIC is 
formulated as follows: 



v i+i 



x i -ma i 2-\y i 

yi + i = yi + <7r 2 ~ l - x i 



(19) 



Where 



a. = 



f sign(w i ) for rotation mode 
[signiw^ for vectoring mode 

For m — 1, 0 or -1 and a i = tan 1 (2 _l ), 2"' or tanh -1 (2~ ! ) , the algorithm given by (19) works in 

Ill-circular, linear or hyperbolic coordinate systems, respectively. Table 1 summarizes the operations that can be 
performed in rotation and vectoring modes in each of these coordinate systems. The convergence range of linear 

and hyperbolic CORDIC are obtained, as in the case of circular coordinate, by the sum of all 0C i given by ^ a . 

1=0 

Table 1 Generalized CORDIC Algorithm 



IV. 



m 


Rotation mode 


Vectoiingmode 


0 


x^fx/osw^sinw,) 




7 s = i(i 0 sinw 0 +^cosw s ) 




w„ = 0 


w^Wj+taif'OVxj) 


1 


X = X 


X = X 




L = o 


w„ = 0 


W > ! = W !+0'A) 


-1 


V^x/oshw^smhw,) 




7„ = yx 0 sinhw s +7 0 coshu>,) 




W „ = 0 


w^Wj+tanh^^/Xj) 



The hyperbolic CORDIC requires to execute iterations for J = 4, 13, 40 twice to ensure 

V.convergence. Consequently, these repetitions must be considered while computing the scale-factor 
K h = Y\0- + 2~ 2i y m , which converges to 0.8281. 



III. Fpga Implementation Of Cordic Algorithm For Sin And Cos Functions 

CORDIC can be used to compute Sin of any angle 9 with little variation. The angle is given as input. 
A vector length 1 .647 (CORDIC gain) along the x-axis is taken. The vector is then rotated in steps so as to reach 
the desired input angle 6 ■ The x and y values are accumulated. After fixed number of iterations the final co- 
ordinates of the vector i.e. the x and y values give value of Cos and Sin respectively of the given angle 6 . When 
the Sin Cos functional configuration is selected, the unit vector is rotated, using the CORDIC algorithm, by 
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input angle 6 . This generates the output vector (cos(#),sin(#)) The compensation scaling module is disabled 
for the Sin and Cos functional configuration as it is internally pre-scaled to compensate for the CORDIC scale 
factor. 

Tab le 2 Resource Utilization of Sin and Cos functional Configura tion 



Configuration 


Serial 


Parallel 

(No 


Parallel 
(Pipeline) 


Parameters — I 




Pipeline ) 




No. of slice f/f 


349 (22%) 


66(4%) 


1146 (74%) 


No. of 4 i/p LUT's 


472 (30%) 


1006(65%) 


1020(66%) 


No. of occupied 
Slices 


330 
(42%) 


564(73%) 


623(81%) 


No. of slices 


330(100%) 


564(100%) 


623(100%) 


containing 








Only related logic 








Total No. of 4 i/p 


595 (38%) 


1087(70%) 


1123 (73%) 


LUT's 









Serial Architecture 



s 60 



1C 
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No. of slice f/f No. of 4 I/P LUT's No. of occupied No. of slices Tolal no. of 4 I/P 
slices containing lut's 

unrelated looic 



Figure 3: Re source Utilization of Sin and Cos functional Configuration in serial architecture 

Paralllel Architecture (No Pipeline) 
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Figure 4: Resource Utilization of Sin and Cos functional Configuration in parallel architecture 
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Parallel Architecture (Pipeline ) 
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Figure 5: Resource Utilization of Sin and Cos functional Configuration in parallel architecture 



From Table 2 and Figures 3-5, it has been concluded parallel architecture uses 74% no. of slices as 
compare to 4% no. of slices used by parallel architecture without pipelining mode and 22% no. of slices used by 
serial architecture. 66% no. of 4 input LUTs used by parallel architecture with pipeline mode but 65% no. of 4 
input LUTs are used by parallel architecture without pipelining and 30% no. of 4 input LUTs are used by serial 
architecture. 81% and 73% occupied slices are used by parallel architecture with or without pipelining 
continuously and 42% occupied slices are used by serial architecture. 

IV. Conclusion 

From the above discussion, although parallel architecture with pipeline seems to be costlier as compare 
to parallel without pipelining and serial architecture, yet parallelarchitecture has high throughput (i.e. speed) as 
compare to serial architecture. 
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